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Abstract 

The reliance on the internet has made it possible for a number of internet net- 
works to arise, each with a distinct user base. Intentionally or not, we are 
all members of a wide range of social networks. Online interpersonal and 
professional interactions are significantly influenced by social networking. It 
has a tremendous effect on a global scale and an individual one, affecting a 
wide range of industries including education, healthcare, entertainment, bank- 
ing, and telecommunications. As their dependency on social media increases, 
users are publishing a lot of information about themselves online, leaving their 
data and themselves vulnerable to the outside world and making them ideal 
targets for criminals which not only jeopardizes the security of the social net- 
work’s data but also make way to a slew of other potentially harmful situations, 
ranging from identity theft to major cybercrime such as hacking, cyber-bullying 
cyber threats, and even national security threats such as terrorism. This neces- 
sitated the development of methods and strategies to detect fraudulent users 
or abnormalities on social media. A graph framework is the most prominent 
form of mathematical modeling of a social network, hence deducing methods 
to identify abnormalities from a graph is critical. This paper gives a thorough 
review of graph-based anomaly detection methods, with a focus on identifying 
anomalous subgraphs. Since anomaly detection on subgraphs has received lit- 
tle attention from the researchers’ community in contrast to other anomalous 
units, we examine the numerous research problems and outstanding questions 
in this domain. 
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1. Introduction tained solely by professionals to being the world’s 
biggest computer network, through many changes 
and becoming as a worldwide parallel society in 
its own capacity. Throughout the short history of 


the Internet, the advent of Web 2.0 in the initial 


One cannot envision living without the internet in 
today’s day and age. It has become a necessary 
and integral element of our very existence. It was 


a revolution that transformed the world’s fundamen- 
tal basic form of communication. From grocery 
shopping to socialization, it has now become the 
default mode of interaction in every aspect of mod- 
ern life. It has evolved from its early days as a static 
network acting as a repository of knowledge main- 
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decade of the 21st century marked a turning point 
by encouraging the creation of interactive, crowd- 
sourced communication platforms. Online social 
networks emerged as a result of this (Mislove et al.). 

From a basic platform for information exchange 
to a sophisticated interdisciplinary tool, internet 


287 


Anagha Ajoykumar and Venkatesan M 


is now a tool for content creation, interaction 
and even to relax and unwind. In the previous 
decade, social networks have demonstrated their 
effectiveness in a variety of disciplines, with a 
massive surge in usage and applications. It aids 
in the connection of people from all across the 
world and provides quick and easy communication, 
like in a friends’ networks (Mislove et al.), co- 
authorship networks (Barabsi et al.), mobile call net- 
works (Nanavati et al.), e-mail communication net- 
works (Al-Mukhaini, Al-Qayoudhi, and Al-Badi), 
instant messenger networks (Nanavati et al.). Apart 
from that, it has a significant and significant role 
to play in other vital and serious aspects of soci- 
ety, such as academia (Curran and Hugh), health- 
care (M. Lee, Yoon, and K. .-.-S. Lee), legisla- 
tion (Bright, Brewer, and Morselli), law enforce- 
ment (Garside et al.), and even more crucial areas, 
such as military and intelligence services (Willis 
and Delbaere) or pharmaceutical services Putting all 
of the positives aside, it really is no surprise that 
social media has a negative side (). Apart from the 
negative consequences on users such as strenuous 
lifestyle, sleep disruption, inattentiveness, procras- 
tination, increased sense of social isolation, and so 
on, there are some severe risks in social networks 
induced by the presence of malicious users or sus- 
Picious activities by these users over the internet, 
which overwhelm the remaining users and hence 
give way to illegal behaviour. Frauds and scams, 
along with breach of privacy, data theft, identity 
theft, misleading information, cyber bullying, cyber- 
attacks, hacking, and other issues, are serious con- 
cerns with billions of fraudulent members on the 
network with unknown motives like even terror- 
ism (Keyvanpour, Moradi, and Hasanzadeh) (Liu 
and Chawla) (R. Yu, He, and Liu) (Bindu and Thi- 
lagam). Hence, it’s necessary to detect the presence 
of these fake users on the network and alert the other 
legitimate users. 


Finding anomalies in a social network is 
therefore crucial. They signify unusual or illegal 
behaviour that is not expected or visible during net- 
work operations normally. It’s possible that this 
node, edge, or subgraph is abnormal. We must 
look at how the various users of the network interact 
with one another in order to find these (Wasserman 
and Faust). Considering graphs are most com- 
monly used to represent graphs, these anomalous 
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units could be found using graph mining techniques. 
There are different methods for spotting fraudulent 
units on each social network because each one has a 
unique structure and set of features. The majority of 
social network researchers have developed a number 
of tools and methodologies for identifying abnor- 
malities in social networks using structural proper- 
ties. 


Detection of anomaly in social networks offers a 
plethora of real-life applications. Due to the increase 
in fraudulent activity on social networks, more peo- 
ple are suffering financial loss as well as harm to 
other users. Individuals within the network, as well 
as unauthorised users, poses a threat to organisa- 
tions. Anomaly detection aids in the identification 
of significant users and rare activities in a network, 
such as uncommon connections between nodes, in 
addition to detecting fraudulent, untrustworthy, or 
dangerous behaviour. This highlights the signifi- 
cance of digital forensics in this setting. Social Net- 
work Forensics (Keyvanpour, Moradi, and Hasan- 
zadeh) is a relatively new field of study that focuses 
on detecting, analysing, preventing, and predict- 
ing undesirable activities in social networks. With 
anomaly detection, this information may be used to 
maintain the network safe and optimise its impact. 


In this paper, we provide a comprehensive and 
systematic review of the research works done in the 
area subgraph anomaly detection in social networks. 
The main contributions of this paper can be sum- 
marised as follows: 


e Addressing the necessity for subgraph anomaly 
detection in social networks. 


e Identifying the critical components associated 
with subgraph anomaly detection in social networks, 


e Providing a comprehensive review of the state 
of the art in subgraph anomaly detection. 


e Exploring the open problems and research chal- 
lenges in the field of subgraph anomaly detection in 
social networks. 


The rest of the paper is organized as follows. 
Section 2 presents the background topics related to 
mining social networks for anomalies. Section 3 
discusses the existing works on subgraph anomaly 
detection in social networks. After presenting a dis- 
cussion on the open problems and research chal- 
lenges in Section 4, we conclude the review in Sec- 
tion 5. 
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2. Technical Preliminaries 


In this section, we discuss the following background 
topics related to anomalous subgraph detection: 
Social Network, Graph Theory, Anomaly Detection 
in Graphs, Types of Anomalies in Social Network 
and Anomalous Subgraphs. 


2.1. Social Network 


A social network is a made up of a number of partic- 
ipants or agents, known as nodes, who are connected 
by various kinds of connections or links (R. J. Wil- 
son). This is depicted in Figure 1. It represents 
interaction between social entities. These actors 
or participants could be individuals, organizations, 
communities, and so on. The relationship between 
these entities could be of any kind like friendship, 
common interest, beliefs among the others. Under- 
standing and analysing a social structure, such as 
identifying local and global traits, influential actors, 
and network dynamics, is made easier when seen 
as a network. Some examples of social networks 
are friends’ networks (Mislove et al.), telephone 
networks (Nanavati et al.), e-mail networks (AI- 
Mukhaini, Al-Qayoudhi, and Al-Badi), to name a 
few. The study of social networks’ characteristics is 
known as social network analysis (R. J. Wilson). It 
enables us to look at the interactions between those 
connected via social networks and get understand- 
ing of the patterns present. Graphs are usually used 
to model social networks. 


FIGURE 1. Representation of social network 
with various user interconnected with each other. 
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2.2. Graph Theory 


In a very basic sense, graphs are simply set of points 
connected by a set of lines. According to (Biggs, 
Lloyd, and R. Wilson) (Bondy and Murty), a simple 
graph G consists of a non-empty finite set V(G) of 
elements called vertices (or nodes), and a finite set 
K(G) of distinct unordered pairs of distinct elements 
called edges. V(G) is called the vertex set and E(G) 
is called the edge set of the graph G. An edge {v, w} 
is said to join the vertices v and w, and is commonly 
written as vw. (Grubbs) defines graph as an ordered 
triple (V(G), E(G), YG) consisting of a nonempty 
set V(G) of vertices, a set E(G), disjoint from V(G), 
of edges, and an incidence function WG that asso- 
ciates with each edge of G an unordered pair of (not 
necessarily distinct) vertices of G. For example, Fig- 
ure 2 represents a simple graph with six vertices and 
seven edges. 


For the sake of this review, several additional 
notions relating to graph theory that are crucial are 
also covered as specified in (Biggs, Lloyd, and R. 
Wilson). If there is a one-to-one correspondence 
between the vertices of two graphs, then two graphs 
are said to be isomorphic if the number of edges 
connecting any two vertices in one graph equals the 
number of edges connecting the corresponding ver- 
tices in the other. An example is shown in Figure 
3. Two vertices of a graph are said to be adjacent if 
there is an edge joining them, and those vertices are 
then incident with that edge. In the similar manner, 
two distinct edges are adjacent if they have a vertex 
in common. The degree of a vertex is the number 
of edges incident with it. An example is shown in 
Figure 4. A subgraph of a graph is the one whose 
edges are all members of the parent graph’s edge 
set and whose vertices are all members of the parent 
graph. For example, Figure 5 denotes a subgraph of 
the graph in Figure 2 obtained by deleting the ver- 
tices E and F. 


2.3. Anomaly Detection in Graphs 


Anomaly is a term with lots of variations in def- 
inition as stated by different people and in differ- 
ent contexts and applications. (Barnett and Lewis) 
defines an outlier as an observation that stands out 
significantly from the other observations in the sam- 
ple. It is an observation (or selection of observa- 
tions), according to (John), that doesn’t seem to 
match with the other data. According to (Aggarwal 
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FIGURE 2. A simple graph with 6 vertices and 
7 edges. 


u V Ww | q 
a 
x y z n r 


FIGURE 3. Isomorphic graphs under the corre- 
spondence u<>l, vom, won, xq, yor, Zp. 


»———_« we yp ft ¥ 
FIGURE 4. Adjacent vertices u and v; degree of 
each of u and v is 3; adjacent edges e and f. 
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FIGURE 5. Subgraph of the graph in Figure 2 
obtained by deleting the vertices E and F. 
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and P. S. Yu), an outlier can also be described as 
shocking real-world data, which is when a point is 
mistakenly classified as belonging to class B when 
it actually belongs to class A, shocking the observer. 
Like in Figure 6, outliers are noise points that are 
outside of a specified set of clusters or points that 
are outside of the specified set of clusters but still 
independent from the noise (Chandola, Banerjee, 
and Kumar). (Savage et al.) defines anomalies as 
data patterns that do not fit a recognised pattern 
of expected behaviour. (Vengertsev and Thakkar) 
defines it as portions of the network with a struc- 
ture that differs from what you may expect from 
the network’s typical structure. Anomaly detection 
refers to the problem of locating certain patterns or 
substructures that are unexpected, undesirable, and 
should be identified to safeguard the network and its 
users (Kaur and Singh). 


Anomalous group 
ee) 


Normal group @ 


FIGURE 6. Example of anomalies in a simple 
two-dimensional graph. 


2.4. Types of Anomalies in Social Network 


Based on a variety of factors, anomalies can be 
divided into many categories (Gao et al.). The three 
primary categories of anomaly are point anomaly, 
contextual anomaly, and collective anomaly. This 
could be dependent on the nature and extent of 
anomalous (Savage et al.). A point anomaly is a sin- 
gle data point or user that behaves differently from 
the rest of the data. A data collection will contain 
contextual anomalies, which are conditional anoma- 
lies that show up when a data object deviates signifi- 
cantly from the context. When a group of data items 
behaves differently from other groups, even though 


International Research Journal on Advanced Science Hub (IRJASH) 290 


Study of Anomalous Subgraph Detection in Social Networks 


the individual data items itself might not be abnor- 
mal, this is called a collective anomaly. Anomalies, 
like in (Vengertsev and Thakkar), can be categorised 
as static or dynamic according on the network topol- 
ogy being used, as well as labelled or unlabelled 
depending on the type of information provided at 
a node or an edge. White crow anomalies and in- 
disguise anomalies, a different kind of anomalies, 
were introduced in (W. Eberle and L. Holder). In 
a situation where one data object significantly dif- 
fers from other observations, a phenomenon known 
as the ’white crow anomaly” arises. That seemed 
almost impossible in this scenario. An in-disguise 
anomaly is an extremely subtle deviation from the 
norm in behaviour that is difficult to detect. The 
graphical qualities of anomalies or structural pro- 
cesses like insertion, deletion, and modification can 
also be used to categorise them (Akoglu, Mcglohon, 
and Faloutsos). Anomalies in (Ma et al.) such as 
Near Stars/Cliques, Heavy Locality, and Particular 
Dominant Links are based on types of communi- 
cation and linkages among nodes. Near Stars have 
neighbours who are entirely linked, whereas Near 
Cliques have neighbours who are completely iso- 
lated. Particular dominant link implies significant 
load around a certain entity, whereas heavy local- 
ity implies abnormally heavy load around a specific 
group.Deciding which parameter must be consid- 
ered to define the categories of anomalies depends 
on the application for which the abnormalities are 
discovered. The detection approaches are used to 
find anomaly units in networks, such as edges, 
nodes, subgraphs, and/or events (Wasserman and 
Faust). When identifying users whose behaviour 
considerably deviates from the norm, we examine a 
group of nodes as anomalies. If we need to identify 
unexpected or irregular interactions between users, 
a subset of edges may be considered anomalous. 
Anomalous subgraph identification seeks for sub- 
networks where the way the nodes interact is dif- 
ferent from how it is throughout the rest of the net- 
work. Events in dynamic networks are the fixed time 
intervals where the social network diverges consid- 
erably from the previous and following networks in 
the sequence. 


2.5. Anomalous Subgraphs 


In reality, anomalies may collaborate and act in 
concert with others to get advantages. Fake users 
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in a review sites network, for example, may pub- 
lish fraudulent reviews to promote or disparage spe- 
cific products. These data are shown as graphs, 
and the interactions among them typically yield sus- 
pected sub-graphs (Ranshous et al.). Because it is 
extremely difficult to enumerate every conceivable 
subgraph in even a single graph, discovering sub- 
graphs with unexpected behaviour requires a dif- 
ferent approach than detecting anomalous vertices 
or edges (Greene, Doyle, and Cunningham). As 
a result, the subgraphs that are analysed or iden- 
tified are largely limited, such as those discovered 
with community detection methods. Matching algo- 
rithms, such as the community matching approach, 
are required in these cases to track the subgraphs 
through time steps (Cook and L. B. Holder). To 
detect subgraph anomalies in static social net- 
works, various techniques such as Network Struc- 
ture based approach and signal processing-based 
approach are used, whereas in dynamic social net- 
works, community-based approach, matrix/tensor 
decomposition-based approach, probability-based 
approach, and so on are used. In the last sev- 
eral years, a lot of work has gone into employ- 
ing deep learning approaches to solve this prob- 
lem (Wasserman and Faust). Due to the versatil- 
ity of heterogeneous graphs in depicting intricate 
interactions between various types of real objects, 
deep network representation approaches have been 
utilised in several recent articles to identify real- 
world abnormalities (Ranshous et al.). 


3. Existing methods of Subgraph Anomaly 
Detection 


Almost all research papers on subgraph anomaly 
detection have been considered for this study. Since, 
subgraph anomaly detection is a slightly under- 
explored area, the number of works carried out 
is less and there are even lesser works that use 
deep learning techniques. Most non-deep learn- 
ing technique-based works can be broadly classified 
in to methods which are applied on static graphs, 
or dynamic graphs, as well as attributed or non- 
attributed graphs. Table 1 summarises the tech- 
niques reviewed and their limitations. 


In Subdue (Noble and Cook), frequent substruc- 
tures are found via a greedy beam search, and they 
are then rated according to the Minimum Descrip- 
tion Length (MDL) concept. Substructures that are 
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TABLE 1. Summary of reviewed anomalous subgraph detectiontechniques 


Input 


Technique 


Limitations 


Static Unattributed Graph 


Static Unattributed Graph 


Static Unattributed Graph 
Dynamic Unattributed 


Graph 
Static Unattributed Graph 


Static Attributed Graph 


Dynamic attributed Graph 
Static Attributed Graph 
Static Unattributed Graph 


Static Attributed Graph 


Static Unattributed Graph 
Static Unattributed Graph 


Dynamic Unattributed 
Graph 


Non-deep Learning methods 
Minimum description length (Noble 
and Cook) 

Minimum description length (Rattigan 
and Jensen) 


Randomized graph 

traversal (Thompson and Eliassi-Rad) 
Product rule for the central limit 
theorem (Miller, Bliss, and Wolfe) 
Eigenvector 11 norms (Newman) 


Extension of subdue (Mongiovi et al.) 


Heaviest subgraph detection with fixed 
length moving window (Gupta et al.) 
Query based paradigm using 

egonets (Zhao and Han) 

Signal processing on chung lu random 
networks (Hong) 

Tree approximation and dynamic 
programming (Berk and Jones) 


Deep Learning methods 
Dense block detection approach (Ester 
et al.) 
Dense block detection 
approach (Akoglu, Tong, and Koutra) 
Residual matrix-based convolutional 
neural network (H. Wang et al.) 


Does not work for numeric values 
or continuous attributes 

Cannot operate on unweightrd 
graph with discrete vertex and 
edge labels 

Cannot detect in time-evolving 
social networks 

Does not study edge correlations 


Cannot detect subgraphs than can 
be separated from background in 
space of small number of 
eigenvector 

Cannot be used for online 
detection of anomalies using 
dynamic graphs 

Not realistic in dynamic running 
conditions and system operations 
Cannot be used on temporal 
graphs or high dimensional data 
Connections between anomalous 
nodes is not established 

Cannot be used on dynamic 
multi-attributed heterogenous 
networks 


Does not work on non-bipartite 
graphs 

Does not work on non-bipartite 
graphs 

Cannot be used on attributed 
graphs 


more frequent in the graph have lower Description 
Lengths (DL), which implies that substructures with 
a high DL are more anomalous. (W. 

bibinitperiod Eberle and L. Holder) presented an 
algorithm based this heuristic for anomaly detec- 
tion. It preserves a parent list at the start, an ordered 
list of all detected substructures. All of the sub- 
structures are repeatedly removed from the parent 
list, their extensions are generated, evaluated, and 
then added to the list. A second list of the top sub- 
structures found so far is kept up to date when new 
substructures are produced. The substructure with 
the highest value is reported, and before the next 


iteration starts, each instance of the substructure is 
replaced with a new vertex representing it. It oper- 
ates under the premise that because anomalous sub- 
structure typically contains few common patterns 
and is therefore more easily detectable than other 
subgraphs, it tends to experience less compression 
than other subgraphs. However, it does not work for 
numeric values or continuous attributes. 


(Rattigan and Jensen) developed three 
approaches for graph-based data fraud preven- 
tion and detection. or the purpose of anomaly 
recognition, they classify graph changes into three 
categories: modification, vertex/edge deletion, and 
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vertex/edge insertions. One of these subtypes is 
the main focus of each algorithm. They primarily 
employ the theory of the minimum description 
length (Shrivastava, Majumder, and Rastogi) to find 
the normative pattern, and then they take a different 
route to find specific anomalous kinds. The first 
algorithm uses the normative patterns to look for 
patterns whose cost of transformation is below a 
given threshold. More anomalous patterns are those 
that have a lower value for a combination of cost 
and frequency. The second algorithm examines and 
assesses the likelihood of the presence of extensions 
of normative substructures. Less probable patterns 
are more anomalous. The third algorithm chooses 
patterns that are ancestors of the substructure 
and have the highest potential substructure of the 
normative pattern. More anomalous patterns have 
lower transformation costs. Each approach can 
detect anomalies on graphs with various sizes with 
high detection accuracies and low false positive 
rates, but it is unable to do so on unweighted graphs 
with discrete vertex and edge labels. 


(Thompson and Eliassi-Rad) provides two effec- 
tive methods to mine subgraphs satisfying the Ran- 
dom Link Attack (RLA) property. Using a ran- 
dom selection algorithm, the attacker node chooses 
a group of victim nodes to connect with in a wide 
variety of assaults on communication networks. The 
primary feature that distinguishes the assault group 
from a social subgraph is the existence of exterior 
triangles, which the attackers establish with the rest 
of the network and consist of one attack node and 
two non-attackers. The number of these triangles 
will be quite minimal for a malicious node. In 
order to create a potential attack cluster, the first 
technique, known as GREEDY, iteratively attaches 
nodes with a greater extent of connectedness to the 
attack cluster. An attack node will connect to numer- 
ous additional attack nodes located throughout the 
network in order to avoid being found. It is unlikely 
that many victims will have edges to the same good 
node in the neighbourhood of a subset of attackers 
and a few victims in a neighbourhood made up of 
victims, attackers, and a few good nodes. Nodes 
in the neighbourhood with powerful link to the sub- 
set will thus either be an attacker or a victim. If 
the node has more triangles than a certain thresh- 
old, it is probably an attacker. In the second method, 
referred to as triangle random walk (TRWALK), a 
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randomised graph traversal is carried out, each time 
beginning at a questionable node. A triangle is 
randomly selected from its surroundings and then 
swapped to another triangle whose edge it shares 
and is repeated until a collection of nodes visited 
during the TRWALK is acquired. An attack set is 
likely to result from an iteration that does not cross 
any exterior triangles. The subset is examined for 
an RLA instance before moving on to the following 
suspect. 


(Miller, Bliss, and Wolfe) employs a scalable 
method based on the Product Rule for the Central 
Limit Theorem to assess the likelihood of occur- 
rences and identify anomalous activity in volatile 
time-evolving networks. In order to recognise an 
unusual occurrence, the method initially develops a 
baseline for normal behaviour by finding persistent 
patterns among vertices, which is a group of ver- 
tices that form a linked component and communi- 
cate often. It then makes use of this data to identify 
unusual behaviour on both a local and a global level. 
It simulates a weighted ”’cumulative” graph from 
the database of the time-evolving network, which 
is a dynamic graph made up of a fixed set of ver- 
tices and a set of time-stamped edges. It includes 
all previous edges but prioritises more recent ones, 
making it useful for estimating connection strength 
on average. By taking linked edges with weights 
that are higher than a certain threshold and regu- 
larly recurring edges, it extracts persistent patterns. 
We compare the present activity at a given period 
with the activity that is anticipated based on prior 
behavioural trends in order to identify anomalies. If 
the actual activity differs noticeably from the antic- 
ipated activity, we define the occurrence as anoma- 
lous. A specified anomality threshold is used in this 
comparison, and anomalous behaviour is marked for 
examination and analysis. 


(Newman) offers a framework that presents a sig- 
nal processing-based detection theory for anoma- 
lies in unweighted, undirected graphs applying the 
LI properties of the eigenvectors of the modular- 
ity matrix of the graph (Davis et al.). This mea- 
sure is shown to have a reasonably low variance for 
numerous kinds of randomly created graphs and to 
accurately detect the presence of an anomalous sub- 
graph when it is not intimately related with stronger 
sections of the background graph. By projecting 
the large graph into the space of its two principal 
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eigenvectors, computing a Chi squared test statistic, 
and comparing the result to a threshold, the anal- 
ysis of the principal eigenvectors of the modular- 
ity matrix can also reveal the presence of a small, 
tightly connected component embedded in the larger 
graph. The ’’strength” with which a vertex is a mem- 
ber of the linked community is correlated with the 
size of the vertex’s component in an eigenvector. As 
a result, if a small group of vertices form a com- 
munity, with few of them belonging to other com- 
munities, an eigenvector that is well aligned with 
this group will exist. This implies that the norm 
of this eigenvector will be lower than the norm of 
an eigenvector with a similar eigenvalue when there 
isn’t an abnormally dense subgraph. So, the sub- 
graph detection algorithm calculates the modularity 
matrix’s eigen decomposition for the graph, deter- 
mine the L1 norm for each eigenvector, then take 
away the anticipated value, normalising the result 
by the L1 norm. The presence of an anomalous sub- 
graph embedding is indicated if any of these mod- 
ified L1 norms falls below a predetermined thresh- 
old. 


(Mongiovi et al.) described an algorithm that 
analyses labelled graphs for structural and numeri- 
cal anomalies. By giving anomaly scores, it expands 
the original Subdue method to encompass numeri- 
cal outliers. By changing the graph so that all nor- 
mal edges have a constant value while anomalous 
edges evaluate to a collection of values using K- 
Nearest Neighbours, it distinguishes between nor- 
mal values and anomalous ones. A collection of 
feature vectors is created using the characteristics 
of the vertices or edges under scrutiny. To get the 
k-distance to a vertex’s kth closest neighbour, the 
feature vector for each vertex is compared to the full 
set. If the k-distance is normal, the constant value 
is returned; if not, the outlierness index is calcu- 
lated using this distance. Each vertex is given an 
anomaly score and then split up into sets so that sim- 
ilar types of vertex are grouped together. A vertex’s 
type may depend on the label assigned to it, the kind 
of edges it is related to, or a variety of other criteria. 
For edges as well, the same procedure is done. On 
this, the Subdue technique is used to get common 
substructures, and the compressed graph’s anomaly 
scores are computed. A weighted graph may be 
used to identify structural and numerical abnormali- 
ties by swapping out the numerical edge weights for 
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anomaly scores. It only functions with static graphs, 
though. 


(Gupta et al.) suggests a method to create a thor- 
ough list of all important anomalous locations in a 
dynamic network. It begins by outlining the reg- 
ular behaviour of network edges, ranks edges over 
time according to how out of the ordinary their 
behaviour is, and then suggests a method for cal- 
culating extended areas of anomalous edges in order 
to locate anomalous locations. When given an edge 
and its weight at a specific time, the p-value is cal- 
culated as the percentage of timestamps where the 
same edge has an equal or greater weight recorded 
on it. The p-value of an observed score decreases 
as the observation becomes more abnormal. The 
approach is based on the NetAmoeba technique, 
which roughly approximates the Heaviest Dynamic 
Subgraph. The maximum score subsequence, which 
considers a given subgraph and determines the best 
subgraph for this interval by optimising the time 
interval that yields the highest score, computes the 
heaviest subgraph last. After receiving as inputs a 
score threshold and a parameter that sets the num- 
ber of failures that must occur before stopping, the 
method outputs a collection of anomalous locations 
whose score exceeds the threshold. It runs NetA- 
moeba iteratively and begins the search with a seed 
generation process. The network is then cleared of 
the positive weights of edges that are located inside 
the recently found region. The algorithm stops when 
the final group of identified regions has a score 
below the threshold. It is unlikely that a region with 
a score higher than the requirement will be found in 
the future, hence the region is deemed anomalous if 
it is not found numerous times in a row. 


(Zhao and Han) suggests a way for determin- 
ing a subgraph’s outlierness using a max-margin 
framework. This method compares the margin for 
linked to non-linked node pairings nearby a sub- 
graph match in order to determine the outlier scores, 
which are used to rank such subgraph outliers. In 
order to allow the user, the freedom to find out- 
liers complying to a particular architecture and con- 
ditionals stored in the form of a query, the study 
focuses on query-based outliers exploiting neigh- 
bourhood data. The first is to find every instance of 
the query that matches the given entity-relationship 
graph. The collection of all matches for the query is 
provided to us by an SPath-based solution (B. Wang 
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et al.). By simply listing all the graph edges that 
each match covers, it is simple to compute the set 
of all induced matches. The next step is to calculate 
the outlier score for each match. The outlier score 
of a match is computed using the margin for the 
max-margin hyperplane or the finest feature weight 
vector. After the matches are arranged according to 
their outlier scores in a non-ascending order, the top 
few matches might be returned as outliers. The sug- 
gested approach only applies to static networks and 
hence it is incompatible with temporal networks. 


(Hong) provide a pre-processing method in 
which a local vertex set with a high likelihood 
of including the anomalous vertices is success- 
fully obtained by subgraph search. The detection 
approach based on the local set may greatly improve 
detection performance due to the low noise power of 
the relational data corresponding to the local vertex 
set and the modest signal power loss. The sparse 
background graph, modelled by a Chung-Lu ran- 
dom graph, contains a dense abnormal graph fitted 
with the Erdos-Renyi model. A few priori adja- 
cency matrices were also known. The anomalous 
vertices are described by the priori adjacency matri- 
ces. Using this a priori data, the subgraph search 
approach is developed to condense the global vertex 
set into a small set. The starting point of each vertex 
set is initially determined by the largest anomalous 
coefficient. Then, based on the biggest coefficient 
among the revised values for each of them, a vertex 
from the initial vertex’s neighbouring matrix is cho- 
sen and the remaining vertices are added in the same 
way. The set with the highest coefficient is picked as 
the most anomalous among all sets. This is carried 
out for each graph snapshot to produce a number of 
local sets, which are then joined to create the final 
set. A detection statistic is applied to this final ver- 
tex set to determine if the graph is anomalous or not. 
This detection statistic is a random variable and has 
a Poisson binomial distribution (Shao et al.). 


In order to handle the issue of anomaly iden- 
tification in multi-attributed networks, (Berk and 
Jones) suggests a generic framework called multi- 
attributed anomalous subgraphs and attributes scan- 
ning (MASA). It recognises the associated sub- 
set of abnormal properties as well as an anoma- 
lously linked subgraph. The framework optimises 
an anomalous scoring function using a set of sophis- 
ticated nonlinear nonparametric scan statistic func- 
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tions. The Berk-Jones statistic (Neil et al.) is used 
as a case study in this study to show how anomalous 
an attribute is determined by its statistical p-value, 
which is calculated as the proportion of historical 
observations that have a greater or equal observa- 
tion on this attribute (Luan et al.). The nonparamet- 
ric scan statistics, created for computing the joint 
anomalousness of the p-values, are used to formu- 
late the functions used to estimate the anomaly char- 
acteristic of the subgraphs and the corresponding 
subset of attributes. The graph is approximated as 
the tree from a randomly chosen root node using the 
tree approximation priors. The Steiner tree is used 
as a case study in this work. Then, for the afore- 
mentioned functions, finding the most anomalously 
linked subgraph and the qualities related to it may 
be roughly compared to finding the best subtree in 
the tree and the attributes associated to it. It seems 
sense that an attribute would have a greater anoma- 
lous value if its p-value were less. 


The first effort at using deep learning technique 
for subgraph anomaly detection was made by (H. 
Wang et al.). The anomalous subgraph detec- 
tion issue was formalised as a binary hypothesis 
test, where the null hypothesis represents a normal 
observed graph and the alternative hypothesis repre- 
sents a background graph that contains an anoma- 
lous subgraph. They presented a framework for 
detecting subgraphs using Deep Neural Networks 
(DNN) that includes both an offline training phase 
and an online detection phase. In the offline phase, 
samples are delivered to the hidden layer to gen- 
erate feature maps for capturing the state of the 
graph, and a training set is built based on the spe- 
cific form of the neural network. The optimal detec- 
tion statistic for the task of identifying anomalous 
subgraphs is determined using the Neyman-Pearson 
theorem. DNN uses back propagation to deter- 
mine the optimum parameter. The trained DNN is 
fed the observed sample during the online phase 
to construct the feature vector, and the detection 
statistic is determined. This statistic, when com- 
pared to the threshold, determines whether or not 
the observed graph has an anomalous subgraph. 
Based on this framework, an algorithm known as the 
residual matrix-based convolutional neural network 
(RM-CNN) was created, which locates the graph’s 
aberrant behaviour with the maximum likelihood of 
identification for a given false alarm probability.. 
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The purpose of (Ester et al.) was to learn unusual 
occurrence representations of users such that benign 
users are dispersed over the vector space while 
suspicious users belonging to a single group will 
be close to one another. The suggested model, 
DeepFD, evaluates behavioural similarities between 
two users as the proportion of shared characteristics 
across all the things they have examined. This is in 
response to the discovery that user nodes associated 
with a particular fraudulent group are much more 
likely to have connections with identical item nodes. 
An autoencoder that follows the encoding-decoding 
procedure and was trained using three losses is then 
used to create user representations. The learned item 
representations and user representations can be used 
to correctly reconstruct the bipartite graph structure 
thanks to the first loss, the reconstruction loss. The 
second term keeps track of user resemblance data in 
the learned user representations. In other words, if 
two users engage in similar behaviours, their repre- 
sentations should do the same. The third loss Lreg 
is used to regularise all trainable parameters. The 
suspected dense blocks that are projected to produce 
dense areas in the feature space are then found using 
DBSCAN (Zheng et al.). 


(Akoglu, Tong, and Koutra) uses the dense 
block detection approach to further detect both 
malicious users and associated modified products 
in online review networks that are modelled as 
bipartite graphs. FraudNE seeks to cluster suspi- 
cious users and objects from the same dense block 
together while distributing other items at random as 
opposed to encoding both nodes of various kinds 
into a common latent space like DeepFD does. 
FraudNE employs a source node and a sink node 
autoencoder—to understand user and item charac- 
terizations, respectively. Both autoencoders undergo 
lengthy training in order to effectively reduce their 
particular reconstruction losses and a shared loss 
function. Reconstruction losses assess the mis- 
match between the decoded characteristics of the 
inputs’ extracted features from the graph structure. 
The shared loss function is proposed to restrict 
the learning of representations and guarantee that 
each interconnected pair of users and items receives 
comparable representations. FraudNE employs the 
DBSCAN (Zheng et al.) method, which is practical 
to utilise for dense area identification, to discrimi- 
nate between dense sub-graphs created by suspect 
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4. Open Problems and Research Challenges 


Taking into account the numerous methodologies in 
the literature, it can be observed that the process of 
finding anomalies in social networks is made up of 
two very distinct subprocesses, namely the suitable 
feature space and the detecting of anomaly in the 
space (Vengertsev and Thakkar). However, it wasn’t 
clear why a specific set of attributes was being taken 
into consideration. Choosing an appropriate feature 
space may be very challenging in reality because 
there aren’t many papers that explicitly explain the 
justifications for looking at a specific collection of 
characteristics. It is unclear which feature combina- 
tions should be utilised to capture essential concepts 
because many social network properties are more or 
less interconnected. In the absence of explicit jus- 
tification for why a particular behaviour should be 
readily identifiable by a particular anomalous net- 
work feature, there is no reason to assume that a par- 
ticular anomalous network characteristic will act in 
a specific way across different data sets, represent- 
ing various social networks. Many of the approaches 
looked at were only tested on one or a few data 
sets, and as a result, they might have been skewed 
towards the particular anomalies seen in those data 
sets. 


The enormous search area and combinatorial 
nature of the enumeration of potential graph sub- 
structures that are connected with more compli- 
cated anomalies present still another difficulty in the 
task of identifying the anomalies [58]. When the 
graphs are attributed, the possibilities expand even 
more because they cover both the attribute space 
and the graph structure. However, not all algo- 
rithms are applicable everywhere, and many con- 
temporary techniques were developed with specific 
problem areas and data types in mind. While com- 
paring existing and novel approaches, it is critical 
to consider how the size of the research, the volume 
of anomalies, and the magnitude of the gap between 
normal and unusual data affect the performance of 
the algorithms. There are currently very few pub- 
licly accessible data sets with established ground 
truths that may be used for such comparisons. So, 
a variety of approaches are assessed on a limited 
amount of data, and verifying findings is done for 
the highest anomalies using inspection, which is 
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extremely time-consuming and highly dependent on 
the level of disclosure by the target system’s owners. 
Even if some data sets are reasonably well defined, 
anomalies discovered in these data sets may be eas- 
ily compared to known sequences of occurrences, 
these data sets are typically small and may only be 
acceptable for a certain portion of problem domains. 
Therefore, creating synthesized large datasets might 
be a sensible course of action to overcome this diffi- 
culty. 


There are still many problems that might be han- 
dled in the future despite the substantial amount 
of work done in this domain, especially with 
regard to handling anomalies in dynamic networks 
because comparably little progress has been made 
in this area (Gao et al.). Despite the fact that 
some approaches make use of temporal informa- 
tion, social networks have not given much atten- 
tion to the time dimensions. For each of the 
social network techniques, such as behaviour-based, 
structure-based, or spectral-based, there is still 
potential for the exploration of a number of addi- 
tional graph metrics that might be used to discover 
the new sorts of anomalies existing in distinct social 
networks. Relatively little research has been focused 
on it, but the focus of researchers right now is on 
looking for anomalies in massive data from social 
networks. Current solutions either focus on a pre- 
defined set of labelled data or examine the activ- 
ity of randomly chosen nodes rather than studying 
the irregular behaviour of data in social networks. 
Although node and edge oddities have received con- 
siderable attention, subgraph anomalies were previ- 
ously given less attention but have recently gained 
ground. As can be seen, deep learning has a lot of 
potential applications in this field and must be con- 
sidered in the years ahead. 


The fact that most modern techniques use deep 
learning technologies and social networks are fre- 
quently represented as graphs creates a great deal 
of complexity (Ranshous et al.). There is little to 
no prior knowledge regarding the features or pat- 
terns of anomalies in real applications due to the fact 
that tagged ground-truth anomalies are often inac- 
cessible for research across a broad variety of indus- 
tries. Graph anomalies will display various out-of- 
the-ordinary patterns in various types of graphs. The 
fact that there are several types of graph anoma- 
lies necessitates the need for detection systems to 
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have precise definitions of anomalies as well as the 
ability to recognise audible cues about the abnormal 
patterns’ atypical behaviour. These strategies must 
be able to handle the high dimensional and large 
data that real-world networks most frequently pro- 
duce and be able to uncover abnormal patterns while 
adhering to practical resource and computing time 
constraints. Since real-world networks tend to be 
dynamic in nature, it is important to evaluate the var- 
ious links between items that have been restored in 
traditional graphs or hypergraphs in order to account 
for the varying patterns of anomalies. Also, it ought 
to be resistive to concealed abnormalities and adapt- 
able to newly discovered anomalies. 


The requirements for anomaly detection in 
social media platforms will change quickly in the 
near future as ever-growing data volumes and more 
complex behaviours are taken into account. This 
might inspire the concept of special places with 
more complex design components. Thus, it will 
be helpful to develop any guidelines or tactics for 
translating actual behaviour into appropriate fea- 
ture spaces. The fine line separating typical users 
and abnormal users would make it a lot harder to 
forecast the latter, necessitating the development of 
more potent and innovative strategies. Anomalies 
must not only be detected but also prevented because 
some domains or apps cannot allow the compromis- 
ing of their sensitive data. As a result, they must 
be vigilant to any abnormal or malicious users long 
before they are actually discovered. However, it has 
been clear from the start that much more effort has 
gone into anomaly detection than towards its avoid- 
ance. Therefore, research must indeed concentrate 
strongly on these aspects in the coming years. 


5. Conclusion 


The study provides a thorough analysis of the vari- 
ous methods that have been suggested for identify- 
ing subgraph abnormalities in social networks that 
are represented as graphs. Due to the large size of 
the network and its dynamic nature, mining social 
networks for anomalies is a complex and computa- 
tionally intensive task. In the last decade, a wide 
range of algorithms for detecting social network 
anomalies in various problem circumstances have 
been introduced. The state-of-the-art approaches are 
organized in this work, and the associated method- 
ologies are briefly discussed. Starting with the 
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fundamental technical facts needed to comprehend 
the work done in this domain, it moves on to 
conventional anomaly detection approaches, which 
were eventually supplanted by graph-based anomaly 
detection due to its enhanced applicability and effi- 
ciency. In addition, following a whole slew of sta- 
tistical methodologies, deep learning has gradually 
made its way into the domain and is now being 
used to detect anomalies in graph-based networks. 
However, there hasn’t been much research on deep 
learning. The paper also goes through the many 
research problems and open issues for future study 
in this area, as well as how deep learning can be 
utilized to detect anomalies in social networks in 
the future. Choosing an algorithm is tough given 
the several techniques described. Many application- 
specific considerations must be taken into account 
when selecting an algorithm, including the nature of 
the network being analyzed and the kinds of abnor- 
malities to be found. This comprehensive overview 
lists the numerous methods for searching social net- 
works for anomalies that have been developed as 
well as suggestions for how to make the current 
methods more effective. Even though a lot of work 
has been done, there is indeed a lot more to be done 
in terms of refinement and attention. 
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